{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# Quantsim and Adaround - Per Channel Quantization (PCQ)\n",
    "\n",
    "This notebook illustrates the use of AIMET Adaround feature.\n",
    "\n",
    "AIMET quantization features, by default, use the \"nearest rounding\" technique for achieving quantization. When using the \"nearest rounding\" technique, the weight value is quantized to the nearest integer value. The Adaptive Rounding (AdaRound) feature, uses a smaller subset of the unlabelled training data to adaptively round the weights. AdaRound, optimizes a loss function using the unlabelled training data to adaptively decide whether to quantize a specific weight to the integer value near it or away from it. Using the AdaRound quantization, a model is able to achieve an accuracy closer to the FP32 model, while using low bit-width integer quantization.\n",
    "\n",
    "\n",
    "**Per Channel Quantization**\n",
    "AIMET by default performs quantization on a per tensor basis. However, for better quantization performance (specifically networks with many convolutions) quantization can be done on a per channel quantization basis.\n",
    "\n",
    "The difference between per tensor and per channel quantization can be illustrated with a single 2D convolution layer in Keras. Imagine that we have a 2D convolution layer with 64 filters, a kernel size of 3, and input of shape of (28, 28, 1). If we were to per tensor quantize this layer, we would look at the entirety of the convolutions weight matrix (convolution kernel). From here, we would take its overall max, overall min, and compute the encoding to quantize the entire matrix. In contrast, on a per channel basis, we would repeat the process of finding a min, max, and computing an encoding. However, we would repeat this for every output channel in the convolution. In this example, that would mean we would have 64 encodings that are unique to each channel. This more detailed calculation is what attributes to better performance in quantization.\n",
    "\n",
    "*Example Keras Conv2D Layer*\n",
    "```python\n",
    "from tensorflow.keras import layers\n",
    "conv2d_layer = layers.Conv2D(filters=64, kernel_size=3)\n",
    "```\n",
    "\n",
    "\n",
    "#### Overall flow\n",
    "This notebook covers the following\n",
    "1. Instantiate the example evaluation and training pipeline\n",
    "2. Load the FP32 model and evaluate the model to find the baseline FP32 accuracy\n",
    "3. Apply PCQ Adaround and get corresponding model\n",
    "4. Create a PCQ quantization simulation model (with fake quantization ops inserted) from the Adaround model and evaluate this simuation model to get a quantized accuracy score\n",
    "5. Exporting the simulation models encodings and how to take them to SNPE/QNN\n",
    "\n",
    "\n",
    "#### What this notebook is not\n",
    "* This notebook is not designed to show state-of-the-art results. For example, it uses a relatively quantization-friendly model like Resnet50. Also, some optimization parameters are deliberately chosen to have the notebook execute more quickly.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "## Dataset\n",
    "\n",
    "This notebook relies on the ImageNet dataset for the task of image classification. If you already have a version of the dataset readily available, please use that. Else, please download the dataset from appropriate location (e.g. https://image-net.org/challenges/LSVRC/2012/index.php#).\n",
    "\n",
    "**Note**: To speed up the execution of this notebook, you may use a reduced subset of the ImageNet dataset. E.g. the entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. But for the purpose of running this notebook, you could perhaps reduce the dataset to say 2 samples per class. This exercise is left upto the reader and is not necessary.\n",
    "\n",
    "Edit the cell below and specify the directory where the downloaded ImageNet dataset is saved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "DATASET_DIR = \"/path/to/dataset/dir/\"          # Please replace this with a real directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "\n",
    "## 1. Example evaluation and training pipeline\n",
    "\n",
    "The following is an example training and validation loop for this image classification task.\n",
    "\n",
    "- **Does AIMET have any limitations on how the training, validation pipeline is written?** Not really. We will see later that AIMET will modify the user's model to create a QuantizationSim model which is still a Keras model. This QuantizationSim model can be used in place of the original model when doing inference or training.\n",
    "- **Does AIMET put any limitation on the interface of the evaluate() or train() methods?** Not really. You should be able to use your existing evaluate and train routines as-is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from Examples.common import image_net_config\n",
    "from Examples.tensorflow.utils.keras.image_net_dataset import ImageNetDataset\n",
    "from Examples.tensorflow.utils.keras.image_net_evaluator import ImageNetEvaluator\n",
    "\n",
    "\n",
    "class ImageNetDataPipeline:\n",
    "    \"\"\"\n",
    "    Provides APIs for model evaluation and finetuning using ImageNet Dataset.\n",
    "    \"\"\"\n",
    "\n",
    "    @staticmethod\n",
    "    def get_val_dataset() -> tf.data.Dataset:\n",
    "        \"\"\"\n",
    "        Instantiates a validation dataloader for ImageNet dataset and returns it\n",
    "        :return: A tensorflow dataset\n",
    "        \"\"\"\n",
    "        data_loader = ImageNetDataset(DATASET_DIR,\n",
    "                                      image_size=image_net_config.dataset['image_size'],\n",
    "                                      batch_size=image_net_config.evaluation['batch_size'])\n",
    "\n",
    "        return data_loader\n",
    "\n",
    "    @staticmethod\n",
    "    def evaluate(model, iterations=None) -> float:\n",
    "        \"\"\"\n",
    "        Given a Keras model, evaluates its Top-1 accuracy on the validation dataset\n",
    "        :param model: The Keras model to be evaluated.\n",
    "        :param iterations: The number of iterations to run. If None, all the data will be used\n",
    "        :return: The accuracy for the sample with the maximum accuracy.\n",
    "        \"\"\"\n",
    "        evaluator = ImageNetEvaluator(DATASET_DIR,\n",
    "                                      image_size=image_net_config.dataset[\"image_size\"],\n",
    "                                      batch_size=image_net_config.evaluation[\"batch_size\"])\n",
    "\n",
    "        return evaluator.evaluate(model=model, iterations=iterations)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "\n",
    "## 2. Load the model and evaluate to get a baseline FP32 accuracy score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "For this example notebook, we are going to load a pretrained ResNet50 model from Keras. Similarly, you can load any pretrained Keras model instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras.applications.resnet50 import ResNet50\n",
    "\n",
    "model = ResNet50(include_top=True,\n",
    "                 weights=\"imagenet\",\n",
    "                 input_tensor=None,\n",
    "                 input_shape=None,\n",
    "                 pooling=None,\n",
    "                 classes=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "Let's determine the FP32 (floating point 32-bit) accuracy of this model using the evaluate() routine\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ImageNetDataPipeline.evaluate(model=model, iterations=10)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "## 3. Create a quantization simulation model and determine quantized accuracy\n",
    "\n",
    "## Fold Batch Normalization layers\n",
    "Before we determine the simulated quantized accuracy using QuantizationSimModel, we will fold the BatchNormalization (BN) layers in the model. These layers get folded into adjacent Convolutional layers. The BN layers that cannot be folded are left as they are.\n",
    "\n",
    "**Note**: For per channel to be enabled, we pass a config file in which the config file has *per_channel_quantization* set to *True*. This config file will be used later on with Adaround as well. Having one place describing the quantization style ensures that we don't mismatch when applying QuantSim and Adaround together.\n",
    "\n",
    "**Why do we need to this?**\n",
    "On quantized runtimes (like TFLite, SnapDragon Neural Processing SDK, etc.), it is a common practice to fold the BN layers. Doing so, results in an inferences/sec speedup since unnecessary computation is avoided. Now from a floating point compute perspective, a BN-folded model is mathematically equivalent to a model with BN layers from an inference perspective, and produces the same accuracy. However, folding the BN layers can increase the range of the tensor values for the weight parameters of the adjacent layers. And this can have a negative impact on the quantized accuracy of the model (especially when using INT8 or lower precision). So, we want to simulate that on-target behavior by doing BN folding here.\n",
    "\n",
    "The following code calls AIMET to fold the BN layers of a given model. </br>\n",
    "**NOTE: During folding, a new model is returned. Please use the returned model for the rest of the pipeline.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from aimet_tensorflow.keras.batch_norm_fold import fold_all_batch_norms\n",
    "from aimet_tensorflow.keras.quantsim import QuantizationSimModel\n",
    "from aimet_common.defs import QuantScheme\n",
    "\n",
    "_, model = fold_all_batch_norms(model)\n",
    "sim = QuantizationSimModel(model=model,\n",
    "                           quant_scheme=QuantScheme.post_training_tf,\n",
    "                           rounding_mode=\"nearest\",\n",
    "                           default_output_bw=8,\n",
    "                           default_param_bw=8,\n",
    "                           config_file=\"Examples/tensorflow/utils/keras/pcq_quantsim_config\")  # NOTE: We tell QuantSim to run per channel quantization through the config file defined here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "Even though AIMET has added 'quantizer' nodes to the model graph but the model is not ready to be used yet. Before we can use the sim model for inference or training, we need to find appropriate scale/offset quantization parameters for each 'quantizer' node. For activation quantization nodes, we need to pass unlabeled data samples through the model to collect range statistics which will then let AIMET calculate appropriate scale/offset quantization parameters. This process is sometimes referred to as calibration. AIMET simply refers to it as 'computing encodings'.\n",
    "\n",
    "So we create a routine to pass unlabeled data samples through the model. This should be fairly simple - use the existing train or validation data loader to extract some samples and pass them to the model. We don't need to compute any loss metric etc. So we can just ignore the model output for this purpose. A few pointers regarding the data samples\n",
    "- In practice, we need a very small percentage of the overall data samples for computing encodings. For example, the training dataset for ImageNet has 1M samples. For computing encodings we only need 500 or 1000 samples.\n",
    "- It may be beneficial if the samples used for computing encoding are well distributed. It's not necessary that all classes need to be covered etc. since we are only looking at the range of values at every layer activation. However, we definitely want to avoid an extreme scenario like all 'dark' or 'light' samples are used - e.g. only using pictures captured at night might not give ideal results.\n",
    "\n",
    "The following shows an example of a routine that passes unlabeled samples through the model for computing encodings. This routine can be written in many different ways, this is just an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.utils import Progbar\n",
    "from tensorflow.keras.applications.resnet import preprocess_input\n",
    "\n",
    "def pass_calibration_data(sim_model, samples):\n",
    "    tf_dataset = ImageNetDataPipeline.get_val_dataset()\n",
    "    dataset = tf_dataset.dataset\n",
    "    batch_size = tf_dataset.batch_size\n",
    "\n",
    "    progbar = Progbar(samples)\n",
    "\n",
    "    batch_cntr = 0\n",
    "    for inputs, _ in dataset:\n",
    "        sim_model(preprocess_input(inputs))\n",
    "\n",
    "        batch_cntr += 1\n",
    "        progbar_stat_update = \\\n",
    "            batch_cntr * batch_size if (batch_cntr * batch_size) < samples else samples\n",
    "        progbar.update(progbar_stat_update)\n",
    "        if (batch_cntr * batch_size) > samples:\n",
    "            break\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "Now we call AIMET to use the above routine to pass data through the model and then subsequently compute the quantization encodings. Encodings here refer to scale/offset quantization parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sim.compute_encodings(forward_pass_callback=pass_calibration_data,\n",
    "                      forward_pass_callback_args=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "Now the QuantizationSim model is ready to be used for inference. First we can pass this model to the same evaluation routine we used before. The evaluation routine will now give us a simulated quantized accuracy score for INT8 quantization instead of the FP32 accuracy score we saw before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "## 4. Apply Adaround\n",
    "\n",
    "We can now apply AdaRound to this model in a per channel quantization fashion.\n",
    "\n",
    "**Note**: For per channel to be enabled, we pass a config file to `apply_adaround` in which the config file has *per_channel_quantization* set to *True* just as we did with QuantSim.\n",
    "\n",
    "Some of the parameters for AdaRound are described below\n",
    "\n",
    "- **data_set:**  AdaRound needs a dataset to use data samples for the layer-by-layer optimization to learn the rounding vectors. Either a training or validation dataloader could be passed in.\n",
    "- **num_batches:** The number of batches used to evaluate the model while calculating the quantization encodings. Typically we want AdaRound to use around 2000 samples. So with a batch size of 32, this may translate to 64 batches. To speed up the execution here we are using a batch size of 1.\n",
    "- **default_num_iterations:** The number of iterations to adaround each layer. Default value is set to 10000 and we strongly recommend to not reduce this number. But in this example we are using 32 to speed up the execution runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from tensorflow.keras.applications.resnet import preprocess_input\n",
    "from tensorflow.keras.preprocessing import image_dataset_from_directory\n",
    "from aimet_tensorflow.keras.adaround_weight import Adaround, AdaroundParameters\n",
    "from aimet_common.defs import QuantScheme\n",
    "\n",
    "ada_round_data = image_dataset_from_directory(directory=DATASET_DIR,\n",
    "                                              labels=\"inferred\",\n",
    "                                              label_mode=\"categorical\",\n",
    "                                              batch_size=image_net_config.evaluation[\"batch_size\"],\n",
    "                                              shuffle=False,\n",
    "                                              image_size=(image_net_config.dataset[\"image_width\"],\n",
    "                                                          image_net_config.dataset[\"image_height\"]))\n",
    "ada_round_data = ada_round_data.map(lambda x, y: preprocess_input(x))\n",
    "\n",
    "params = AdaroundParameters(data_set=ada_round_data,\n",
    "                            num_batches=1, default_num_iterations=32)\n",
    "\n",
    "os.makedirs(\"./output/\", exist_ok=True)\n",
    "ada_model = Adaround.apply_adaround(model, params, path=\"output\", filename_prefix=\"adaround\",\n",
    "                                    default_param_bw=8, default_quant_scheme=QuantScheme.post_training_tf,\n",
    "                                    config_file=\"Examples/tensorflow/utils/keras/pcq_quantsim_config\")  # NOTE: The same config file used in QuantSim is used here as well. Again, telling Adaround to enable PCQ.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "Now, we can determine the simulated quantized accuracy of the model after applying Adaround. We again create a simulation model like before and evaluate to determine simulated quantized accuracy.\n",
    "\n",
    "**Note:** There are two important things to understand in the following cell.\n",
    "  - **Parameter Biwidth Precision**: The QuantizationSimModel must be created with the same parameter bitwidth precision that was used in the apply_adaround() created.\n",
    "\n",
    "  - **Freezing the parameter encodings**:\n",
    "After creating the QuantizationSimModel, the set_and_freeze_param_encodings() API must be called\n",
    "before calling the compute_encodings() API. While applying AdaRound, the parameter values have\n",
    "been rounded up or down based on these initial encodings internally created. Fo\n",
    "r Quantization Simulation accuracy, it is important to freeze these encodings.\n",
    "If the parameters encodings are NOT frozen, the call to compute_encodings() will alter\n",
    "the value of the parameters encoding and Quantization Simulation accuracy will not be correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from aimet_tensorflow.keras.quantsim import QuantizationSimModel\n",
    "\n",
    "sim = QuantizationSimModel(model=ada_model,\n",
    "                           quant_scheme=QuantScheme.post_training_tf,\n",
    "                           rounding_mode=\"nearest\",\n",
    "                           default_output_bw=8,\n",
    "                           default_param_bw=8,\n",
    "                           config_file=\"Examples/tensorflow/utils/keras/pcq_quantsim_config\")  # NOTE: The same config file used in the first QuantSim.\n",
    "\n",
    "sim.set_and_freeze_param_encodings(encoding_path=os.path.join(\"output\", 'adaround.encodings'))\n",
    "\n",
    "sim.compute_encodings(forward_pass_callback=pass_calibration_data,\n",
    "                      forward_pass_callback_args=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "Now the QuantizationSim model is ready to be used for inference. First we can pass this model to the same evaluation routine we used before. The evaluation routine will now give us a simulated quantized accuracy score for INT8 quantization instead of the FP32 accuracy score we saw before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ImageNetDataPipeline.evaluate(model=sim.model, iterations=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "Depending on your settings you may have observed a slight gain in accuracy after applying adaround. Of course, this was just an example. Please try this against the model of your choice and play with the number of samples to get the best results.\n",
    "\n",
    "Now the next step would be to take this model to target. For this purpose, we need to export the model with the updated weights without the fake quant ops. And also to export the encodings (scale/offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sim.export(path='./output', filename_prefix='resnet50_pcq_adaround')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "---\n",
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "This example illustrated how AIMET AdaRound and QuantSim API is invoked to achieve post training quantization on a per channel basis. To use AIMET for your specific needs, replace the model with your model and\n",
    "replace the data pipeline with your data pipeline. This will provide you a quick starting point. As indicated above, some parameters in this example have been chosen in such a way way to make this example execute faster.\n",
    "\n",
    "Hope this notebook was useful for you to understand how to use AIMET for performing Adaround and QuantSim on a per channel basis.\n",
    "\n",
    "Few additional resources\n",
    "- Refer to the AIMET API docs to know more details of the APIs and optional parameters\n",
    "- Refer to the other example notebooks to understand how to use AIMET post-training quantization techniques and QAT techniques"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "3.8.0"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
